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C***'* ■ The traveling salesman problem (TSP) consists of finding the length of the short- 

est closed tour visiting N "cities." We consider the Euclidean TSP where the cities 
are distributed randomly and independently in a <i-dimensional unit hypercube. Work- 
ing with periodic boundary conditions and inspired by a remarkable universality in the 

Skth nearest neighbor distribution, we find for the average optimum tour length (L E ) = 
(5 E {d)N 1 - 1 l d [1 + 0(1/N)] with f3 E (2) = 0.7120 ± 0.0002 and /3 E (3) = 0.6979 ± 0.0002. 
We then derive analytical predictions for these quantities using the random link approx- 
imation, where the lengths between cities are taken as independent random variables. 
q I From the "cavity" equations developed by Krauth, Mezard and Parisi, we calculate the 

associated random link values f3m(d). For d = 1,2,3, numerical results show that the 
random link approximation is a good one, with a discrepancy of less than 2.1% between 
(3 E (d) and (3nL{d). For large d, we argue that the approximation is exact up to 0(l/d 2 ) 
and give a conjecture for (3 E (d), in terms of a power series in 1/d, specifying both leading 
and subleading coefficients. 
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1 Introduction 



Given N "cities" and the distances between them, the traveling salesman problem (TSP) con- 
sists of finding the length of the shortest closed "tour" (path) visiting every city exactly once, 
where the tour length is the sum of the city-to-city distances along the tour. The TSP is NP- 
complete, which suggests that there is no general algorithm capable of finding the optimum tour 
in an amount of time polynomial in N. The problem is thus simple to state, but very difficult 
to solve. It also happens to be the most well known combinatorial optimization problem, and 
has attracted interest from a wide range of fields. In operations research, mathematics and 
computer science, researchers have concentrated on algorithmic aspects. A particular focus has 
been on heuristic algorithms — algorithms which do not guarantee optimal tours — for cases 
where exact methods are too slow to be of use. The most effective heuristics are based on local 
search methods, which start with a non-optimal tour and iteratively improve the tour within a 
well-defined "neighborhood" ; a famous example is the Lin-Kernighan heuristic [IJ . More recent 
efforts have involved combining local search and non-deterministic methods, in order to refine 
heuristics to the point where they give good enough solutions for practical purposes; a powerful 
such technique is Chained Local Optimization @. 

Over the last fifteen years, physicists have increasingly been drawn to the TSP as well, 
and particularly to stochastic versions of the problem, where instances are randomly chosen 
from an ensemble. The motivation has often been to find properties applicable to a large class 
of disordered systems, either through good approximate methods or through exact analytical 
approaches. In our work, we consider two such stochastic TSPs. The first, the Euclidean 
TSP, is the more classic form of the problem: N cities are placed randomly and independently 
in a d- dimensional hypercube, and the distances between cities are defined by the Euclidean 
metric. The second, the random link TSP, is a related problem developed within the context 
of disordered systems: rather than specifying the positions of cities, we specify the lengths 
hj separating cities i and j, where the Uj are taken to be independent, identically distributed 
random variables. The appeal of the random link problem is, on the one hand, that an analytical 
approach exists for solving it [EL 0|, and on the other hand, that when certain correlations are 
neglected this TSP can be made to resemble the Euclidean TSP. We therefore consider the 
random link problem as a random link approximation to the (random point) Euclidean problem. 
Researchers outside of physics remain largely unaware of the analytical progress made on the 
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random link TSP; one of our hopes is to demonstrate how these results are of direct interest in 
problems where the aim is to find the optimum Euclidean TSP tour length. 

Our approach in this paper is then to examine both the Euclidean problem and the random 
link problem — the latter for its own theoretical interest as well as for a better understanding 
of the Euclidean case. We begin by considering in depth the Euclidean TSP, including a review 
of previous work. We find that, given periodic boundary conditions (toroidal geometry), the 
Euclidean optimum tour length Le averaged over the ensemble of all possible instances has the 
finite size scaling behavior 



From simulations, we extract very precise numerical values for /^(d) at d = 2 and d = 3; 
methodological and numerical procedures are detailed in the appendices. We also give numerical 
evidence that the probability distribution of Le becomes Gaussian in the large N limit. In 
addition to these TSP results, we find a surprising universality in the scaling of the mean 
distance between kth nearest neighbors, for points randomly distributed in the <i-dimensional 
hypercube. Finally, we discuss the expected behavior of in the large d limit. 

In the second part of the paper we discuss the random link problem, considering it as an 
approximation to the Euclidean problem. Making use of the cavity method, we compare the 
random link @RL(d) with the Euclidean /^(d) values obtained from our simulations. We find 
that the random link approximation is correct to within 2% at d = 2 and 3. The rest of the 
section studies the large d limit of the random link model and its implications for the Euclidean 
TSP. We examine analytically how [3 mid) scales at large d, and we relate the \ jd coefficient of 
the associated power series to an underlying (^-independent "renormalized" model. Finally, we 
present a theoretical analysis based on the Lin-Kernighan heuristic, suggesting strongly that 
the relative difference between (3nL(d) and is positive and of 0(l/d 2 ). The random link 

results then lead to our large d Euclidean conjecture: 
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where 7 is Euler's constant. 
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2 The Euclidean TSP 



2.1 Scaling at large N 

One of the earliest analytical results for the Euclidean TSP is due to Beardwood, Halton and 
Hammersley || (BHH). The authors considered N cities, distributed randomly and indepen- 
dently in a (i-dimensional volume with distances between cities given by the Euclidean metric. 
They showed that, when the volume is the unit hypercube and the distribution of cities uniform, 
Le/N 1 ^ 1 ^ is self-averaging. This means that with probability 1, 

where (3e{<1) is independent of the randomly chosen instances. This property is illustrated 
in Figure |l[ (In fact, the BHH result is more general than this and concerns an arbitrary 
volume and arbitrary form of the density of cities.) For a physics audience this large N limit 
is equivalent, in appropriate units, to an infinite volume limit at constant density. Le/N 1 ^ 1 ^' 
then corresponds to an energy density that is self-averaging and has a well-defined infinite 
volume limit. The original proof by BHH is quite complicated; simpler proofs have since been 
given by Karp and Steele || |7j . 

One of our goals is to determine /3g (d). BHH gave rigorous lower and upper bounds as a 
function of dimension. For any given instance, a trivial lower bound on Le is the sum over all 
cities i of the distance between % and its nearest neighbor in space. In fact, since a tour at best 
links a city with its two nearest neighbors, this bound can be improved upon by summing, over 
all i, the mean of i's nearest and next-nearest neighbor distances. Taking the ensemble average 
of this quantity (that is, the average over all instances) leads to the best analytical lower bound 
to date. For upper bounds, BHH introduced a heuristic algorithm, now known as "strip," in 
order to generate near-optimal tours (discussed also in a paper by Armour and Wheeler ||). 
In two dimensions the method involves dividing the square into adjacent columns or strips, 
and sequentially visiting the cities on a given strip according to their positions along it. The 
respective lower and upper bounds give 0.6250 < /3g(2) < 0.9204. 

In addition to bounds, it is possible to obtain numerical estimates for BHH used two 

instances, N = 202 and N = 400, from which they estimated /3e(2) ~ 0.749 using hand-drawn 
tours. Surprisingly little has been done to improve upon this value in two dimensions, and 
essentially nothing in higher dimensions. Stein |§ has found /3#(2) « 0.765, which is frequently 
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Figure 1: Self-averaging of 2-D Euclidean TSP optimum: convergence of Le(N, 2)/N 1 / 2 on a sequence 
of random instances at increasing N. 



cited. Only recently have better values been obtained, but as they come from near-optimal 
tours found by heuristic algorithms, they should be considered more as upper bounds than as 



estimates. Using a local search heuristic known as "3-opt" |TU[ , Ong and Huang |TTJ have found 



Ae(2) < 0.743; using another heuristic, "tabu" search, Fiechter has found (3e(2) < 0.731 



and using a variant of simulated annealing, Lee and Choi [13[] have found /3g (2) < 0.721. In what 
follows we shall show what is needed for a more precise estimate of /3e{cI) with, furthermore, a 
way to quantify the associated error. 

2.2 Extracting f3 E (d) 

As N — > oo, Le/N 1 ' 1 ^ converges with probability 1 to the instance-independent ^(d). Our 
estimate of must rest on some assumptions, though, since only finite values of iV are 

accessible numerically. Note first that at values of iV where computation times are reasonable, 
Le has substantial instance-to-instance fluctuations. To reduce and at the same time quantify 
these fluctuations, we average over a large number of instances. We thus consider the numerical 
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mean of Le over the instances sampled, which itself satisfies the asymptotic relation (|3]) but with 
a smoother convergence. To extract @E(d), we must understand precisely what this convergence 
in N is. 

If cities were randomly distributed in the hypercube with open boundary conditions, the 
cities near the boundaries would have fewer neighbors and therefore lengthen the tour. In 
standard statistical mechanical systems at constant density, boundary effects lead to corrections 
of the form surface over volume. For the TSP at constant density, the volume grows as iV and 
the surface as N 1 " 1 ^. In a <i-dimensional unit hypercube, then, the ensemble average of Le 
would presumably have the large N behavior 

In order to extract ^{d) numerically, it would be necessary to perform a fit which includes 
these corrections. A reliable numerical fit, however, must have few adjustable parameters, and 
the slow convergence of this series would prevent us from extracting /3e(c£) to high accuracy. We 
therefore have chosen to eliminate these boundary (surface) effects by using periodic boundary 
conditions in all directions. This should not change @E(d), but leaves us with fewer adjustable 
parameters and a faster convergence, enabling us to work with smaller values of N where 
numerical simulations are not too slow. 

For the hypercube with periodic boundary conditions, let us introduce the notation 

= (5) 

where (Le) is the average of Le over the ensemble of instances. ((3e{N, d) is, in physical units, 
the zero-temperature energy density.) We then wish to understand how /3E{N,d) converges to 
its large N limit, /3e(d). In standard statistical mechanical systems, there is a characteristic 
correlation length £. Away from a critical point, £ is finite, and finite size corrections decrease as 
e~ w ^, where W is a measure of the system "width." At a critical point, £ is infinite, and finite 
size corrections decrease as a power of 1/W. For disordered statistical systems, however, this 
picture must be modified. Even if £ is finite for each instance in the ensemble, the fluctuating 
disorder can still give rise to power-law corrections for ensemble averaged quantities. In the 
case of the TSP, this is particularly clear: the disorder in the positions of the cities induces 
large finite size effects even on simple geometric quantities. 
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To see how this might affect the convergence of /3e(N, d), consider the following. For a 
given configuration of iV points, call Dk(N, d) the distance between a point and its kth nearest 
neighbor, where k — 1, . . . , N — 1. Take the points to be distributed randomly and uniformly 
in the unit hypercube. Let us find (Dk(N,d)). Under periodic boundary conditions, the 
probability density p(l) of finding a point at distance / from another point is simply equal (for 
< / < 1/2) to the surface area at radius I of the (i-dimensional sphere: 

dn d / 2 



P(l) 



I 



d-l 



(6) 



r(d/2 + 1) 

The probability of finding a point's kth nearest neighbor at distance I (see Figure 0) is equal to 
the probability of finding k — 1 (out of N — 1) points within I, one point at I and the remaining 
N — k — 1 points beyond 
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Figure 2: A point's N — 1 neighbors: k — 1 nearest neighbors are within distance I, kth nearest 
neighbor is at I, and remaining N — k — 1 points are beyond I. 
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giving the ensemble average 
(D k {N,d)) = ( I *~l) {N-k)d 



71 



d/2 



r(d/2 + i) 



1/2 



I 



dk 



71 



d/2 



r(d/2 + l 



-/ 6 



N-k-1 



dl + 



(9) 



where the corrections are due to the I > 1/2 case, and are exponentially small in N. 

Recognizing the integral, up to a simple change of variable, as a Beta function (B(a,b) 



1 j.a-1. 



lot 



t) dt = T(a)T(b)/T(a + b)) plus a further remainder term exponentially small in 



N, we see that 

(D k (N,d)) 



T(d/2 + lfl d T(k + 1/d) T(N) 



T(k) T(N + l/d) 
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(10) 



We are confronted here with a remarkable, and hitherto unexplored, universality: the exact 
same 1/N series gives the iV-dependence regardless of k. The same finite size scaling behavior 
therefore applies to all fcth nearest neighbor distances. 

It might be hoped then that the typical link length in optimum tours would have this N- 
dependence, and that 0e(N, d) would therefore have the same 1/N expansion. This is not 
quite the case. The link between cities % and j figures in the average (Dk{N,d)) whenever j 
is the kth neighbor of i; it figures in j3E{N,d), however, only when it belongs to the optimal 
tour. Two different kinds of averages are being taken, and so finite size corrections need not be 
identical. Nevertheless, it remains plausible that (3e{N, d) has a 1/N series expansion, albeit a 



different one from (11). While we cannot prove this property, it is confirmed by an analysis of 
our numerical data. 

Our approach to finding /3g;(d) is thus as follows: (i) we consider the ensemble average (Le), 
rather than Le for a given instance, in order to have a quantity with a well-defined dependence 
on N; (ii) we use periodic boundary conditions to eliminate surface effects; (iii) we sample the 
ensemble using numerical simulations, and measure (3E{N,d) within well controlled errors; (iv) 
we extract @E{d) by fitting these values to a 1/N series. 

2.3 Finite size scaling results 

Let us consider the d = 2 case in detail. We found the most effective numerical optimization 
methods for our purposes to be the local search heuristics Lin-Kernighan (LK) [l| and Chained 
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Local Optimization (CLO) |2| mentioned in the introduction. Both heuristics, by definition, 
give tour lengths that are not always optimal. However, it is not necessary that the optimum be 
found 100% of the time: there is already a significant statistical error arising from instance-to- 
instance fluctuations, and so a further systematic error due to non-optimal tours is acceptable 
as long as this error is kept negligible compared to the statistical error. Our methods, along 
with relevant numerical details, are discussed in the appendices. For the present purposes, let 
us simply mention the general nature of the two heuristics used. LK works by performing a 



"variable-depth" local search, as discussed further in Section |3.6| . CLO works by an iterative 
process combining LK optimizations with random perturbations to the tour, in order to explore 
many different local neighborhoods. We used LK for "small" N values (N < 17), averaging 
over 250,000 instances at each value of N, and we used CLO for "large" N values (N = 30 and 
N = 100), averaging over 10,000 and 6,000 instances respectively. 

We fitted our resulting /3e{N,(1) estimates to a truncated 1/N series: the fits are good, 
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Figure 3: Finite size dependence of the rescaled 2-D Euclidean TSP optimum. Best fit (x 2 = 5.56) 
gives: p E (N, 2)/[l + 1/(8JV) + • • •] = 0.7120(1 - 0.0171/iV - 1.048/JV 2 ). Error bars represent statistical 
errors. 
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and are stable with respect to the use of sub-samples of the data. For a fit of the form 
p E (N,d) = (3 E {d){l + A/N + B/N 2 ), we find (3 E {2) = 0.7120 ± 0.0002, with X 2 = 5.57 for 8 
data points and 3 fit parameters (5 degrees of freedom). Our error estimate for /3e(2) is obtained 
by the standard method of performing fits using a range of fixed values for this parameter: the 
error bar ±0.0002 is determined by the values of /3^(2) which make x 2 exceed its original result 
by exactly 1, i.e., making \ 2 — 6.57 in this case. 

It is possible to extract another /3e(N, d) estimate by making direct use of the universality 
discussed previously: the universal 1/N series in ( |TT|) suggests that there will be a faster 
convergence if we use the rescaled data Pe(N, 2)/[l + 1/(8 A/") + •••]. This also has the appealing 
property of leading to a function monotonic in N, as shown in Figure |3|. We find 

- 0.7120 fl - ^ - ^ (12) 



1 + l/(8iV) + • • • ' \ N N 2 

with the leading term having the same error bar of ±0.0002 as before. Note that the 1/N term 
in the fit is small — 2 orders of magnitude smaller than the leading order coefficient — and so 
to first order the 1 + 1/8N ± • • • series is itself a good approximation. 

The same methodology was applied to the d = 3 case. The x 2 s again confirmed the func- 
tional form of the fit, and we find from our data /3e{3) — 0.6979 ± 0.0002. Also, since our 



initial work |TJJ], Johnson et al. have performed simulations at d = 2, 3, 4, obtaining results [TJj 
consistent with ours: /3 E {2) « 0.7124, ^(3) « 0.6980 and /3 B (4) « 0.7234. 

2.4 Distribution of optimum tour lengths 

While BHH and others || [7| have shown that the variance of L E / N 1 ^ 1 ^ goes to zero as 
A" — > oo (see also Figure 0), they have not determined how fast this variance decreases. More 
generally, one might ask how the distribution of Le/N 1 ^ 1 ^ behaves as N ^ oo. We are aware 
of only one result, by Rhee and Talagrand |16|| , showing that the probability of finding Le with 
\Le — (Le)\ > t is smaller than K exp(— t 2 / K) for some K. Unfortunately this is not strong 
enough to give bounds on the variance. 

Let us characterize the distribution at d = 2 by numerical simulation. For motivation, 
consider the analogy between Le/N 1 ^ 1 ^ and E/V, the energy density in a disordered statistical 
system. If the system's correlation length £ is finite (the system is not critical), E/V has a 
distribution which becomes Gaussian when V — > oo. This is because as the subvolumes increase, 
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Figure 4: Distribution of 2-D Euclidean TSP scaling variable X N = (L E - (L E )) /N l / 2 ~ l / d . Shaded 
region is for N = 12 (100,000 instances used) and solid line is for N = 30 (10,000 instances used). 
Superimposed curve shows (extrapolated) limiting Gaussian. 

the energy densities in each subvolume become uncorrelated; the central limit theorem then 
applies. A consequence is that a 2 , the variance of E/V, decreases as V~ l . If £ is infinite (the 
system is critical), then in general the distribution of E/V is not Gaussian. In both cases 
though, the self- averaging of E/V suggests that the scaling variable X = (E — (E))/aV has a 
limiting distribution when V — > oo. 

In the case of the TSP, it can be argued using a theoretical analysis of the LK heuristic that 
at d > 2 the system is not critical. By analogy with E/V, if we take subvolumes to contain a 
fixed number of cities, the central limit theorem then suggests that Le/N 1 " 1 ^ has a Gaussian 
distribution with a 2 decreasing as iV" 1 . The scaling variable X N = (L E — (L E )) /N 1 / 2 ~ 1 / d 
should consequently have a Gaussian distribution with a finite width for iV — >• oo (and at 
d > 2). Numerical results at d = 2 (see Figure f|) give good support for this. 
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2.5 Conjectures on the large d limit 

In most statistical mechanics problems, the large dimensional limit introduces simplifications 
because fluctuations become negligible. For the TSP, can one expect 0E(d) to have a simple 
limit as d — > oo? Again, consider the property of the fcth nearest neighbor distance D^. In the 
large N limit, (JTlf) gives 

N(D k{ N, d) ) ~ ^ £(f!Z^£^±f) , oratlarg e rf , ( 13 ) 

V 71 " r(A;) 



At 

1 + -T + - 

a 



(14) 



where = —7 + ^-j- + + • • • (7 is Euler's constant). Notice that Ak ~ In k at large fc. This 
suggests strongly that unless the "typical" k used in the optimum tour grows exponentially in 
d, we may write for d — » 00: 



1 + 



(15) 



Up to 0(l/d), this expression is identical to the BHH lower bound on 0E(d) discussed in Section 
pTTl given by the large iV limit of N l / d (Di(N, d) + D 2 (N, d))/2. 

A weaker conjecture than ( fL5l) has been proposed by Bertsimas and van Ryzin [17] : 



0E(d) ~ \Jd/2Tie as — > 00. (16) 

This limiting behavior was motivated by an analogous result for a related combinatorial opti- 
mization problem, the minimum spanning tree. Unfortunately, there is no proof of either (|15D 
or (TO); in particular, the upper bound on ^(d) given by strip, discussed in Section |2J], behaves 



as yd/6 at large d. Thus if the conjectures are true, the strip construction leads asymptotically 
to tours which are on average 1.69 times too long. Can we derive stronger upper bounds? A 
number of heuristic construction methods should do better than strip, but there are no reliable 



calculations to this effect. The only improvements over the BHH results are due to Smith fl8fl . 
who generalized the strip algorithm by optimizing the shape of the strips, leading to an upper 
bound which is a/2 times greater than the predictions of ( |i~5"D and ( Jl6| ) at large d. 

In spite of our inability to derive an upper bound which, together with the BHH lower 
bound, would confirm the two conjectures for d — > 00, we are confident that fllBT) and (|l^) are 
true because of non-rigorous yet convincing arguments. One is a proof that (|16|) is satisfied for 
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the TSP if it is satisfied for another related combinatorial optimization problem (see |AppendS 



E[] for details). A more powerful argument, presented in Section |3.6| , relies on a theoretical 



analysis of the LK heuristic. It suggests that up to 0(l/d 2 ), (3e{<1) is given by a random link 
approximation, leading to a conjecture even stronger than (fRI). 

3 The random link TSP 

3.1 Correspondence with the Euclidean TSP 

Let us now consider a problem at first sight dramatically different from the Euclidean TSP. 
Instead of taking the positions of the iV cities to be independent random variables, take the 
lengths lij = Iji between cities i and j (1 < i,j < N) to be independent random variables, 
identically distributed according to some p(l). We speak of lengths rather than distances, as 
there is no distance metric here. This problem, introduced by physicists in the 1980s [B5 



in 



search of an analytically tractable form of the traveling salesman problem, is called the random 
link TSP. 

The connection between this TSP and the Euclidean TSP is not obvious, as we now have 
random links rather than random points. Nevertheless, one can relate the two problems. To 
see this, consider the probability distribution for the distance / between a fixed pair of cities 
in the Euclidean TSP. This distribution, in the unit hypercube with periodic boundary 
conditions, is given for < / < 1/2 by the expression in (||): 

dit d / 2 

"«>-rwW" (17) 

Of course, in the Euclidean TSP the link lengths are by no means independent random vari- 
ables: correlations such as the triangle inequality are present. However, as noted by Mezard 
and Parisi 0, correlations appear exclusively when considering three or more distances, since 
any two Euclidean distances are necessarily independent. Let us adopt (|P7|) as the 1^ distribu- 
tion in the limit of small / for the random link TSP, where d in this case no longer represents 
physical dimension but is simply a parameter of the model. The Euclidean and random link 
problems then have the same small / one- and two-link distributions. In the large N limit the 
random link TSP may therefore be considered, rather than as a separate problem, as a random 
link approximation to the Euclidean TSP. Only joint distributions of three or more links differ 
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between these two TSPs. If indeed the correlations involved are not too important, then the 
random link /3rl(cI) can be taken as a good estimate of /?b(c?). We shall see that this is true, 
particularly for large d. 



3.2 Scaling at large N 

As in the Euclidean case, we are interested in understanding the N — > oo scaling law in the 
random link TSP. It is relatively simple to see, following an argument similar to the one in 
Section |2.2| , that the nearest neighbor distances have a probability distribution with a 
scaling factor N~~ l l d at large N. Vannimenus and Mezard have suggested that the random 
link optimum tour length with N links will then scale as A 1_1//d , and the tour will be self- 
averaging, i.e., 

&^ = «^ (18) 
parallel to the BHH theorem (|3|) for the Euclidean case. This involves the implicit assumption 
that optimum tours sample a representative part of the distribution, so no further iV 
scaling effects are introduced. The assumption seems reasonable based on the analogy with the 
Euclidean TSP, and for our purposes we shall accept here that (3rl(cI) exists. However, there 
is to our knowledge no mathematical proof of self-averaging in the random link TSP. 

Following the discussion of Section |2.1| , let us consider some bounds on the ensemble average 



{Lrl) as derived in p0| . As before, we get a lower bound on (3rl(cI) using nearest and next 
nearest neighbor distances. For an upper bound, the "strip" algorithm used in the Euclidean 
case (Section [2.1|) cannot be applied to the random link case. On the other hand, Vannimenus 
and Mezard make use of an algorithm called "greedy" |H[ : this constructs a non-optimal tour 
by starting at an arbitrary city, and then successively picking the link to the nearest available 
city until all cities are used once and a closed tour is formed. At d > 1, greedy gives rise to 
tour lengths that are self-averaging, and leads to the upper bound [[2(J 

1 T(d/2 + iy/ d T(l/d) 
P RL (d) < -= j-j . (19) 



At d — 1, the presumed scaling (jlSl) suggests that (Lrl) is independent of N, whereas greedy 
generates tour lengths which grow as lnA^. There is numerical evidence [p2 , ||, however, that 
the d — 1 model does indeed satisfy ([Of), and that (3rl(X) ~ 1-0208. 
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3.3 Solution via the cavity equations 



Since the work of Vannimenus and Mezard, several groups |23], |24], g5f have tried to "solve" the 
statistical mechanical problem of the random link TSP at finite temperature using the replica 
method, a technique developed for analyzing disordered systems such as spin glasses [fl6|j . To 
date, it has only been possible to obtain part of the high temperature series of this system 
[p3| . In view of the intractability of these replica approaches, Mezard and Parisi have derived 
an analytical solution using another technique from spin glass theory, the "cavity method." 
The details of this approach are beyond the scope of this paper, and are discussed in several 
technical articles 0, |27|, |26j. For readers acquainted with the language of disordered systems, 
however, the broad outline is as follows: one begins with a representation of the TSP in terms 
of a Heisenberg (multi-dimensional spin) model in the limit where the spin dimension goes 
to zero. Under the assumption that this system has only one equilibrium state (no replica 
symmetry breaking), Mezard and Parisi have then written a recursion equation for the system 
when a new (N + l)th spin is added. The cavity method then supposes that this new spin's 
effect on the N other spins is negligible in the large N limit, and that its magnetization may 
be expressed in terms of the magnetizations of the other spins. 

Using this method, Krauth and Mezard have derived a self-consistent equation for the 
random link TSP, at N —>■ oo [|J. They have determined the probability distribution of link 
lengths in the optimum tour in terms of Qd{%), where Qd{x) is the solution to the integral 
equation 

Qd(x) = £°° {X ^ 1 [1 + g d {y)\ e- Gdiv) dy- (20) 
Their probability distribution leads to the prediction 

d 



PRL(d) 



2v^F 



T(d/2 + l) 1 1,d 
T(d+1) 



[ °° g d {x)[l + g d {x)]e- g ^ x) dx. (21) 

J — oo 



These equations can be solved numerically, as well as analytically in terms of a 1/d power series 
(see next section). At d = 1, Krauth and Mezard compared their prediction with the results 
of a direct simulation of the random link model; their numerical study |2^, |J strongly suggests 
that the cavity prediction is exact in this case. It has been argued, furthermore, that the 
cavity method is exact at iV — > oo for any distribution of the independent random links |26| . 
Good numerical evidence has been found for this, notably in the case of the matching problem, 
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a related combinatorial optimization problem [28], The validity of the cavity assumptions 
therefore does not appear to be sensitive to the dimension d, and we shall assume that ( |2~1"| ) 
holds for the random link TSP at all d. 

Krauth and Mezard computed the d — 1 and d = 2 cases to give /3rl(1) = 1.0208 and 
Prl(2) = 0.7251. Since (3rl(c£) is taken to approximate /3s(d), let us compare these values with 
their Euclidean counterparts. At d — 1, the Euclidean TSP with periodic boundary conditions 
is trivial (/^(l) = 1); the random link TSP thus has a 2.1% relative excess. At d = 2, 
comparing with /3e(2) = 0.7120 found in Section |2.3| , the random link TSP has a 1.8% excess. 
In low dimensions, the random link results are then a good approximation of the Euclidean 
results. The approximation is better than Krauth and Mezard believed, since they made the 
comparison at d — 2 using the considerably overestimated Euclidean value of /3e(2) ~ 0.749 
from |5|]. 

Extending the numerical solutions to higher dimensions, at d = 3 we find (3rl{3) = 0.7100, 
which compared with /3e(3>) = 0.6979, has an excess of 1.7%. Some further random link values 
are /3rl(4) = 0.7322 and /3rl(5) = 0.7639. The value at d — 4 may be compared with the 
Euclidean estimate of Johnson et al. [IS|, Ae(4) ~ 0.7234, giving an excess of 1.2%. The 



0E{d) data at d = 1,2,3,4 therefore suggest that the random link approximation improves 
with increasing dimension. This leads us to study the limit when d becomes large. 

3.4 Dimensional dependence 

The large d limit was considered by Vannimenus and Mezard |[20| . For /3nL(d), the lower bound 
obtained from {Di(N, d) + D 2 (N, d))/2 by way of ( fTT| ) and the upper bound given in ( |I9| ) differ 
at large d only by 0(l/d), giving: 



Pn L (d) = J^-(7rdY^ 



1 0[ -d 



(22) 



Note that this exact result is the random link analogue of the Euclidean conjecture (|i~5D. 

For values of d < 50, we have calculated /?hl(<^) numerically using the cavity equations 
( P0|) and fl2~ip. The results are shown in Figure [5], along with the converging upper and lower 
bounds, and our low d Euclidean results. 

For large d, we may see whether the cavity equations are compatible with (^) by solving 
them analytically in terms of a 1/c! power series. Define Qd{x) = Qd(T(d + l) 1 ^ [1/2 + x/d}). 
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( pO|) may then be written: 



x—d 
+00 

1 

x—d 



x + y 
d 



d-l r 



1 + Q d {y) 



-Gd{y) 



dy 



D x+y 



i 1 / {x + y) 2 \ „(\ 



l + &(y) 



(23) 
dy. (24) 



Strictly speaking, the expansion of (1 + [x + y]/d) d_1 is only valid in the interval —x — d < 
y < —x + d; however, for large y it can be shown that Qd{y) ~ y d , so the e~ Sd ^ term in the 
integrand makes the y > — x + d contribution exponentially small in d. 

Furthermore, extending the integral's lower limit to include the region y < —x — d also 
contributes a remainder term exponentially small in d. If we write the integral with its lower 
limit at y — —00, the equation may be solved: 



g d {x) = V2e a 



1 x 2 3 
hi- 
fi \ 2 



ln2-2 7 (hi2 + 27) 2 + 61n2 + 127-9 s 



O 



(25) 
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Figure 5: Dimensional dependence of rescaled random link TSP optimum, shown by small points, 
between converging "greedy" upper bound (dotted line) and nearest-neighbors lower bound (dashed 
line). Plus signs at d = 2 and d = 3 show Euclidean results for comparison. 
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where 7, we recall, represents Euler's constant. Using (21), we then find 

2 - In 2 - 2 7 



A. 

2ne 



(TTd) 



l/2d 



1 + 



d 







d 2 



(26) 



which is perfectly compatible with (|2~2|). This provides further evidence that the cavity method 
is exact for the random link TSP. 

3.5 Renormalized random link model at large d 

We can motivate the large d scaling found in the previous section by examining a different 
sort of random link TSP. Consider a new "renormalized" model where link "lengths" Xij are 
obtained from the original 1^ by the linear transformation = d[ly — (Di(N, d))}/ (Di(N, d)). 
Note that the x^ may take on negative values, and that the nearest neighbor length in this new 
model has mean zero. Since the transformation is linear, there is a direct equivalence between 
the renormalized x^ and original Uj TSPs, and the two have the same optimum tours. The 
renormalized optimum tour length L x may then be given in terms of the original tour length 
U by 



= d ■ 



N(D 1 {N,d)) 



Now take N — > 00 and d 



(D 1 (N,d)) ' (27) 
00. It may be seen from the distribution (|lj]) and the 
(Dx(N,d)) expansion (14) that the random variables x^ have the d-independent probability 
distribution p(x) ~ iV _1 exp (x — 7). Also, in the large N limit, since L\ scales as N 1 ^ 1 ^ and 
(Di) scales as N" 1 ^, we expect (L x ) ~ Nfi for some /i which must be, like p(x), independent 
of d. Then, from (|27|), the TSP in the original variables satisfies 



(L l )^N(D 1 (N,d)) 
or, using the expansion (|H]), 



1 + ^ + 
d 



(28) 



2vre 



(nd) 



l/2d 



1 + 



(29) 



This result may be compared with our cavity solution of (p6|), where the 1/d coefficient is equal 
to 2 — ln2 — 27. If the cavity method is correct at 0(l/d), which we strongly believe is the 
case, then a direct solution of the renormalized model should give fi = 2 — In 2 — 7. Work is 
currently in progress to test this claim by numerical methods. 
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Figure 6: Recursive construction of removed links (dashed lines) and added links (bold lines) in an 
LK search. 

3.6 Large d accuracy of the random link approximation 

Since the random link model is considered to be an approximation to the Euclidean case, it 
is natural to ask whether the approximation becomes exact as d —>■ oo. In this section we 
argue that: (i) in stochastic TSPs, good tours can be obtained using almost exclusively low 
order neighbors; (ii) the geometry inherent in the Euclidean TSP leads to /3e(c£) < Amid) in 
all dimensions d; (hi) the relative error of the random link approximation decreases as 1/d 2 at 
large d. All three claims are based on a theoretical analysis of the Lin-Kernighan (LK) heuristic 
algorithm for constructing near-optimal tours. 

The LK algorithm works as follows [|T], |29[. An LK search starts with an arbitrary tour. The 
principle of the search is to substitute links in the tour recursively, as illustrated schematically 
in Figure |6|. The first step consists of choosing an arbitrary starting city i Q . Call i\ the next 
city on the tour, and l\ the link between the two. Now remove this link. Let i[ be the nearest 
neighbor to i\ that was not connected to %\ on the original tour, and let l[ be a new link 
connecting i\ to i[. We now have not a tour but a "tadpole graph," containing a loop with a 
tail attached to it at %' x . At this point, call i<i one of the cities next to i[ on the original tour, 
and remove the link l 2 between the two. There are two possibilities for z 2 (and thus I2): LK 
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chooses the one which, if we were to put in a new link between i 2 and zo, would give a single 
closed tour. Now as before, let i' 2 be the nearest neighbor of i% that was not connected to 12 
on the original tour, and let V 2 be a new link between the two. This gives a new tadpole. The 
process continues recursively in this manner, with the vertex hopping around while the end 
point stays fixed, until no new tadpoles are found. At each step, LK chooses the new i m so as 
to allow the path to be closed up between i m and io, forming a single tour; the result of the 
LK search is then the best of all such closed up tours. The LK algorithm consists of repeating 
these LK searches on different starting points io, each time using the current best tour as a 
starting tour, until no further tour improvements are possible. 

Let us first sketch why the LK algorithm leads to tours which use only links between "near" 
neighbors, where "near" means that the neighborhood order k is small and does not grow with 
d. Consider any tour where a significant fraction of the links connect distant neighbors (large 
k). The links l' m which the LK search substitutes for the l m are, by definition, between very near 
neighbors (k < 3). As long as many long links exist, the probability at each step of substituting 
a near neighbor in place of a far neighbor is significant. Towards the beginning of an LK search 
this probability is relatively constant, so the expected tadpole length will decrease linearly with 
the number of steps. Even taking into account the fact that closing up the path between i m 
and zo might require inserting a link with k > 3, there is a high probability as N — > 00 that the 
improvement in tadpole length far outweighs this cost of closing the tour. Thus for stochastic 
TSPs, regardless of d, the LK algorithm can at large N replace all but a tiny fraction of the 
long links with short links. It follows that in accordance with our Euclidean TSP assumption of 
Section [2.5| , the "typical" k used in the optimum tour remains small at large d. This provides 
very powerful support for the /3e(cf) conjectures (p~5|) and (p!6[). A consequence, making use 
of the exact asymptotic (gQ result (p2[) , is that the relative difference between /^(d) and 
PuL^d) is at most of 0(l/d). 

Our second argument concerns why Pm^d) must be greater than /3#(oQ at all d. For the 
random link TSP there is no triangle inequality, which means that given two edges of a triangle, 
the third edge is on average longer than it would be for the Euclidean TSP. Applying this to 
our LK search, we can expect the link between i m and io closing up the tour to be longer in 
the random link case than in the Euclidean case. Thus on average, the LK algorithm will find 
longer random link tours than Euclidean tours. In fact, this property holds as well for any LK- 
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like algorithm where the method of choosing the l m and l' m links is generalized. If the algorithm 
were to allow all possibilities for l m and l' m , we would be sure of obtaining the exact optimum 
tour, given a long enough search. In that case, the inequality on the tour lengths found by our 
algorithm leads directly to (3rl{(1) > /3e{(1). Not surprisingly, the numerical data confirm this 
inequality at d up to 4 (although one should be cautious when applying the argument at d = 1). 
Note also that the inequality in itself implies conjectures ( pUf) and (16) for the Euclidean model, 
since it supplies precisely the upper bound we need on (3e (d). 

Finally let us explain why the relative difference between Pm(d) and should be 

of 0(l/d 2 ). This involves quantifying the tour length improvement discussed above. It is 
clear that any non-optimal tour can be improved to the point where links are mostly between 
neighbors of low order. If LK, or a generalized LK-like algorithm, is able to improve the tour 



further, the relative difference in length will be of 0(1/ d); we see this from (13), noting that the 
neighborhood order k is small both before and after the LK search. Now we need to quantify 
the probability that LK indeed succeeds in improving the tour. We may consider the vertex of 
the LK tadpole graph as executing a random walk, in which case the probability of closing up a 
tour by a sufficiently short link is equivalent to the probability of the random walk's end-to-end 
distance being sufficiently small. In that case it may be shown that, over the course of an 
LK search, the probability of successfully closing a random link tour minus the probability of 
successfully closing a Euclidean tour scales at large d as 2/(d — 2). From this, we conclude 
that improvements in the Euclidean model are 0(1 /d) more probable than in the random link 
model. Now, the relative tour length improvement for the Euclidean TSP compared to the 
random link TSP is simply the relative tour length improvement when a better tour is found, 
times the probability of finding a better tour — hence 0(l/d 2 ). If we consider a generalized 
LK search as described in the previous paragraph, where the algorithm necessarily finds the 
true optimum, then this result applies to the exact /3s: the relative difference between @RL,(d) 
and Ps(d) will scale at large d as 1/d 2 . 

Three comments are in order concerning this surprisingly good accuracy of the random link 
approximation. First, the factor 2/(d — 2) is only appropriate for large d. It is not small even 
for d = 4. (Its divergence at d = 2 is associated with the fact that a two-dimensional random 
walk returns to its origin with probability 1.) We therefore expect the 1/d 2 scaling to become 
apparent only for d > 5, beyond the range of our numerical data. Second, we have seen that 



21 



the coefficient of the 1/d term in f3nL{d) may be obtained by the cavity method. Assuming 
that this method is correct and that (3 m, (gQ and do indeed converge as 1/d 2 , this leads 

to a particularly strong conjecture for the Euclidean TSP: 



(3 E {d) = \L^{irdfl™ 



2 - In 2 - 2 7 / 1 



(30) 



2vre 

Third, this type of LK analysis can in fact be extended to many other combinatorial optimiza- 
tion problems, such as the assignment, matching and bipartite matching problems. In these 
cases, we expect the random link approximation to give rise to a 0(l/d 2 ) relative error just as 
in the TSP. 

4 Summary and conclusions 

The first goal in our work has been to investigate the finite size scaling of Lg, the optimum Eu- 
clidean traveling salesman tour length, and to obtain precise estimates for its large iV behavior. 
Motivated by a remarkable universality in the kth nearest neighbor distribution, we have found 
that under periodic boundary conditions, the convergence of (Le) /N 1 ^ 1 ^ to its limit ^(o!) is 
described by a series in 1/N. This has enabled us to extract /3s(2) and /3b (3) using numerical 
simulations at small values of N, where errors are easy to control. Furthermore, thanks to a 



bias-free variance reduction method (see [Appendix B|) , these estimates are extremely precise 



Our second goal has been to examine the random link TSP, where there are no correlations 
between link lengths. We have considered it as an approximation to the Euclidean TSP, in 
order to understand better the dimensional scaling of /3^(rf). For small d, we have used the 
cavity method to obtain numerical values of the random link (3m{d). Comparing these with 
our numerical values for /3s(cf) shows that the random link approximation is remarkably good, 
accurate to within 2% at low dimension. For large d, we have solved the cavity equations 
analytically to give (3m{d) in terms of a 1/d series. We have then argued, using a theoretical 
analysis of iterative tour improvement algorithms, that the relative difference between (3m(d) 
and /3g((i) decreases as 1/d 2 . This leads to our conjecture QSOT ) on the large d behavior of /3.e(d), 
specifying both its asymptotic form and its leading order correction. 

Let us conclude with some remaining open questions. First of all, while the cavity method 
most likely gives the exact result for the random link TSP, we would be interested in seeing this 
argued on a more fundamental physical level. Readers with a background in disordered systems 
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will recognize that the underlying assumption of a unique equilibrium state is false in many 
NP-complete problems, and in particular in the spin-glass problem that has inspired the cavity 
method. What makes the TSP different? Second of all, our renormalized random link model 
provides an alternate approach to finding the 1/d coefficient of the power series in Amid), 
and could prove a useful test of the cavity method's validity. A solution to the renormalized 
model using heuristic methods appears within reach. Third of all, the 0(l/d 2 ) convergence 
of the random link approximation merits further study, from both numerical and analytical 
perspectives. Numerically, Euclidean simulations at d > 5 could provide powerful support for 
the form of the convergence, and thus for our conjecture (|30|) . Analytically, the qualitative 
arguments presented in Section |3.6| , based on the LK algorithm, could perhaps be refined by 
a more quantitative approach. Lastly, it is worth noting that the 0(l/d 2 ) convergence should 
apply equally well to the distribution of link lengths in the optimum tour. The random link 
prediction for this distribution can be obtained from the cavity method H]; an interesting test 
would then be to compare it with simulation results for the true Euclidean distribution. 
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Appendix A 

Overview of the numerical methodology 

In the following, we discuss the procedures used to obtain the raw data from which (3e {d) 
and the finite size scaling coefficients are extracted. Two major problems must be solved in 
order to get good estimates of /3e(N,cI). First, /3e(N,cI) is defined as an ensemble average 
(Le(N,cI)) /iV 1-1 /^, but is measured by a numerical average over a finite sample of instances. 
The instance-to-instance fluctuations in Le give rise to a statistical error, which decreases only 
as the inverse square root of the sample size. Keeping the statistical error down to acceptable 
levels could require inordinate amounts of computing time. We therefore find it useful to 
introduce a variance reduction trick: instead of measuring Le, we measure Le — XL*, where A 
is a free parameter and L* can be any quantity which is strongly correlated with Le- Details 
are given in [Appendix B . 



A second and more basic problem is that it is computationally costly to determine the 
optimal tour lengths for a large number of instances, precisely because the TSP is an NP- 
complete problem. The most sophisticated "branch and cut" algorithms can take minutes on 
a workstation to solve a single instance of size A" < 100 to optimality. However, we do not 
need to guarantee optimality: the statistical error in /3e(N, d) already limits the quality of our 
estimate, and so an additional (systematic) error in Le is admissible as long as it is negligible 
compared to the statistical error. We may thus use fast heuristics to measure L E , rather than 
exact but slower algorithms. This is discussed further in [Appendix C. 



Appendix B 

Statistical errors and a variance reduction trick 

Consider estimating (Le{N, d)) at a given A" by sampling over many instances. If we have M 



independent instances, the simplest estimator for (Le(N, d)) is Le(N, d), the numerical average 
over the M instances of the minimum tour lengths. This estimator has an expected statistical 
error cr(M) = ai JE j\[M, where gl e is the instance-to-instance standard deviation of L E - 

Now let us define L^ to be the sum, over all cities, of kth nearest neighbor distances. (Lk) 
is its ensemble average; in terms of the notation used earlier in the text, (Lk) = N(Dk). It 
has been noted by Sourlas |3(J that L E is strongly correlated with L x , L 2 and L 3 . He therefore 
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suggested reducing the statistical error in {Le) using the estimator 



Es — (L123) Le/Li23 



(B.l) 



where L 123 is the arithmetic mean of L 1; L 2 and L 3 . The ensemble average (-^123) can be 
calculated analytically from (0), and so the variance of Es comes from fluctuations in the 
ratio Le/Li23. If Le were a constant factor times L123, this estimator would of course be 
perfect, i.e., it would have zero variance. This is not the case, however, and furthermore the 
use of a ratio biases the Sourlas estimator: its true mathematical expectation value differs from 
(L E {N, d)) by 0(1/N). To improve upon this, we have introduced our own bias- free estimator 



where L 12 is the arithmetic mean of Ly and L 2 , and A is a free parameter. Our estimator has 
a reduced variance because Le and L 12 are correlated. It is easy to show that the variance 
of Em-p is minimized at a unique value of A, A* = C(Le, Lu) ctl e /o-l 12 , where C(A,B) = 
{(A — (A)) (B — (B)) )Ioa®b is the correlation coefficient of A and B. The variance then 
becomes o"! = a \ E \f ~ C 2 (Le, Li 2 )]/M. Empirically, we have found this variance reduction 
procedure to be quite effective, since \/l — C 2 ~ 0.38 at d = 2 and y/1 — C 2 ~ 0.31 at d = 3. 
The statistical error is thus reduced by about a factor of 3; this means that for a given error, 
computing time is reduced by about a factor of 10. 

Appendix C 

Control of systematic errors 

Our procedure for estimating Le at a given instance involves running a good heuristic m times 
from random starts on that instance, and taking the best tour length found in those m trials. 
The expected systematic error can be found from the frequencies with which each local optimum 
appears in a large number of test trials. (This large number must be much greater than m, 
the actual number of trials used in production runs.) The measurement is performed on a 
sufficiently large sample of instances, from which we extract the average size of the systematic 
error in (Le{N, d)) as a function of m. We have found that in practice, this error is dominated 
by those infrequent instances where a sub-optimal tour is obtained with the highest frequency. 



Em-p — A(ivx2/ + Le — AL 



12; 
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As N increases, the probability of not finding the true optimum increases rather fast; for 
a given heuristic, it is thus necessary to increase m with iV in such a way that the systematic 
error remains much smaller than the statistical error. If the heuristic is not powerful enough, 
m will be too large for the computational resources. For our purposes, we have found that 
the Lin-Kernighan heuristic [p] is powerful enough for the smaller values of N (N < 17). For 
20 < iV < 100, it was more efficient to switch to Chained Local Optimization (CLO) P, |32| . 
a more powerful heuristic which can be thought of as a generalization of simulated annealing. 
(When the temperature parameter is set to zero so that no up-hill moves are accepted, as was 
the case for our runs, CLO with embedded Lin-Kernighan is called "Iterated Lin-Kernighan" 
p3[ |34|.) With these choices, using in two dimensions m — 10 for iV < 17 (LK), m = 5 for 



N = 30 and m = 20 for N = 100 (CLO), we have kept systematic errors to under 10% of the 
statistical errors. 

Appendix D 

Bounding /3e(c() using the bipartite matching problem 

Given two sets of iV points Pi, ... , P/v and Qi, ■ ■ ■ ,Qn in d- dimensional Euclidean space, the 
bipartite matching (BM) problem asks for the minimum matching cost Lbm between the P^s 
and the QiS, with the constraint that only links of the form P — Q are allowed. The cost of a 
matching is equal to the sum of the distances between matched pairs of points. When points 
Pj and Qi are chosen at random in a d- dimensional unit hypercube, it is natural to expect 
Lbm/N 1 " 1 ^ to be self- averaging as iV — > oo. To date, a proof of this property has not been 
given, even though the self-averaging of the analogous quantity in the more general matching 
problem (where links P — P and Q — Q are allowed as well) can be shown at all d in essentially 
the same way as for the TSP, following arguments developed by Steele |7|]. For d = 1, it is 
in fact known that self- averaging fails in the BM. For large d, however, let us assume that 
Lbm/N 1 ' 1 ^ does converge to some Pbm (d) in the large N limit. 

We shall now derive a bound for the Euclidean TSP constant (3E{d) in terms of (3bm (d) . 
Consider K disjoint sets Si, ... , Sk, together forming a large set S = Si U • • • U Sk, and let each 
Si contain iV random points in the d-dimensional unit hypercube. Construct the K minimum 
matchings 5i — S2, S2 — S3, . . . , Sk-i — Sk and Sk — Si. Starting at any point in Si, generate a 
loop (a closed path) in S by following the matchings Si — S2, S2 — S3, ... until the path returns 
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to its starting point. The set of all such distinct loops fli, . . . , Qm (M < N) is then equivalent 
to the set S, and furthermore the sum of the loop lengths is equal to the sum of all minimum 
matchings costs (L BM )si-S i+1 - (Note that (L B m)s k -s k+1 is defined as (L B m)s k -S 1 -) 

Now, consider the optimum TSP tour through all the points of Si. Construct a giant 
closed path visiting every point in S at least once, by substituting into this TSP tour the 
loops Qi, . . . , Qm in place of their starting points in Si. Using standard techniques ||, we can 
construct from this path of length (K + 1)N a shorter closed path of length KN which visits 
every point in S exactly once. For the Euclidean TSP tour length Le, we then obtain the 
inequality 

(D.l) 

i=i 

If S consists of random points chosen independently and uniformly in the unit hypercube, 
then averaging over all configurations, dividing by jV 1 ^ 1 /^ and taking the limit iV — > oo, we 
find 

K l - l ' d p E {d) < (3 E {d) + Kp BM {d). (D.2) 
Letting K = d, this gives in the large d limit ^{d) < (3 B M(d). Based on analogies with other 



combinatorial optimization problems [0, (3 B M(d) is expected to scale as yd/2ire when d — > oo. 
In that case, /3e{(1) too must satisfy the Bertsimas-van Ryzin conjecture 



27 



References 

[1] Lin S. and Kernighan B., Operations Res. 21 (1973) 498-516. 

[2] Martin O.C. and Otto S.W., Ann. Operations Res. 63 (1996) 57-75. 

[3] Mezard M. and Parisi G., Europhys. Lett. 2 (1986) 913-918. 

[4] Krauth W. and Mezard M., Europhys. Lett. 8 (1989) 213-218. 

[5] Beardwood J., Halton J.H. and Hammersley J.M., Proc. Cambridge Philos. Soc. 55 (1959) 
299-327. 

[6] Karp R.M. and Steele M., in The Traveling Salesman Problem, E.L. Lawler, J.K. Lenstra, 
A.H.G. Rinnooy Kan and D.B. Shmoys, Eds. (John Wiley and Sons, New York, 1985). 

[7] Steele M., Ann. Probability 9 (1981) 365-376. 

[8] Armour R.S. and Wheeler J.A., Am. J. Phys. 51 (1983) 405-406. 

[9] Stein D., Ph.D. thesis, Harvard University (1977). 

[10] Lin S., Bell Syst. Tech. J. 44 (1965) 2245-2269. 

[11] Ong H.L. and Huang H.C., Eur. J. Operational Res. 43 (1989) 231-238. 

[12] Fiechter C.N., Discrete Appl. Math. 1 (1994) 243-267. 

[13] Lee J. and Choi M.Y., Phys. Rev. E 50 (1994) R651-R654. 

[14] Percus A.G. and Martin O.C, Phys. Rev. Lett. 76 (1996) 1188-1191. 

[15] Johnson D.S., McGeoch L.A. and Rothberg E.E., "Asymptotic Experimental Analysis 
for the Held-Karp Traveling Salesman Bound", 7th Annual ACM-SIAM Symposium on 
Discrete Algorithms (Atlanta, GA, 1996) pp. 341-350. 

[16] Rhee W.T. and Talagrand M., Ann. Probability 17 (1989) 1-8. 

[17] Bertsimas D.J. and van Ryzin G., Operations Res. Lett. 9 (1990) 223-231. 

[18] Smith W.D., Ph.D. thesis, Princeton University (1989). 

28 



[19] Kirkpatrick S. and Toulouse G., J. Phys. France 46 (1985) 1277-1292. 

[20] Vannimenus J. and Mezard M., J. Phys. Lett. France 45 (1984) L1145-L1153. 

[21] Papadimitriou C.H. and Steiglitz K., Combinatorial Optimization: Algorithms and Com- 
plexity (Prentice Hall, Englewood Cliffs, NJ, 1982). 

[22] Krauth W., Ph.D. thesis, Universite Paris-Sud (1989). 

[23] Mezard M. and Parisi C, J. Phys. Lett. France 46 (1985) L771-L778. 

[24] Orland H., J. Phys. Lett. France 46 (1985) L763-L770. 

[25] Baskaran C, Fu Y. and Anderson P.W., J. Stat. Phys. 45 (1986) 1-25. 

[26] Mezard M., Parisi G. and Virasoro M.A., Eds., Spin Glass Theory and Beyond (World 
Scientific, Singapore, 1987). 

[27] Mezard M., Parisi G. and Virasoro M.A., Europhys. Lett. 1 (1986) 77-82. 

[28] Brunetti R., Krauth W., Mezard M. and Parisi G., Europhys. Lett. 14 (1991) 295-301. 

[29] Johnson D.S. and McGeoch L.A., in Local Search in Combinatorial Optimization, 
E.H.L. Aarts and J.K. Lenstra, Eds. (John Wiley and Sons, New York, 1997), to appear. 

[30] Sourlas N., Europhys. Lett. 2 (1986) 919-923. 

[31] Martin O.C. and Percus A.G., Eur. J. Operational Res. (1996), submitted. 

[32] Martin O., Otto SW. and Felten E.W., Complex Systems 5 (1991) 299-326. 

[33] Johnson D.S., in Proceedings of the 17th Colloquium on Automata, Languages, and Pro- 
gramming (Springer- Verlag, Berlin, 1990) pp. 446-461. 

[34] Martin O., Otto SW. and Felten E.W., Operations Res. Lett. 11 (1992) 219-224. 



29 



